Skip to content

Atlas search lookups #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open

Atlas search lookups #325

wants to merge 30 commits into from

Conversation

WaVEV
Copy link
Collaborator

@WaVEV WaVEV commented Jun 24, 2025

This PR adds the initial implementation of the Atlas operator.

Task:

  • Operators
  • Combinable
  • Vector search
  • Score
  • Docs
  • EmbeddedDocument operator

@WaVEV WaVEV force-pushed the atlas-search-lookups branch from 449b6a3 to ca8a7cf Compare June 26, 2025 02:56
@WaVEV WaVEV force-pushed the atlas-search-lookups branch 3 times, most recently from 9935b25 to a467a57 Compare July 12, 2025 23:32
@WaVEV WaVEV changed the title [WIP] Atlas search lookups Atlas search lookups Jul 14, 2025
@WaVEV WaVEV force-pushed the atlas-search-lookups branch 4 times, most recently from ea2118b to 206b554 Compare July 21, 2025 19:29
@WaVEV WaVEV force-pushed the atlas-search-lookups branch 4 times, most recently from 456028d to 65f22e6 Compare July 22, 2025 05:16
@WaVEV WaVEV marked this pull request as ready for review July 24, 2025 19:39
@WaVEV WaVEV force-pushed the atlas-search-lookups branch from eb6eb07 to e7f4d22 Compare July 26, 2025 02:40
@WaVEV WaVEV force-pushed the atlas-search-lookups branch from 0fdb066 to eed2499 Compare August 5, 2025 00:25
@WaVEV WaVEV force-pushed the atlas-search-lookups branch from eed2499 to 99f6548 Compare August 5, 2025 13:35
Comment on lines 51 to 54
``SearchEquals`` objects can be reused and combined with other search
expressions.

See :ref:`search-operations-combinable`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could structure things so we don't need to repeat this boilerplate on every(?) expression.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think we cannot scape, unless we list the operations that could be combined in the section of combinable operations. I like to have this link meanwhile I am reading the docs, so it gives an introduction of some (cool?) behaviour

Comment on lines 64 to 68
delayedAssertCountEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertCountEqual)
delayedAssertListEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertListEqual)
delayedAssertQuerySetEqual = _delayed_assertion(timeout=2)(
TransactionTestCase.assertQuerySetEqual
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the non-delayed versions ever used? Maybe it's better to overwrite the original names so we don't have to write "delayedXXXXX" everywhere. Or maybe the waiting could be done in setUp() after data is inserted? Unless some test inserts more data, essentially only the first test's waiting is needed, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I all the checks are delayed...
Regarding to the second question: right, any test that insert data need to wait. If the data is inserted in the init class, we could only wait once. So If we want to get rid of those delayed, we can wait in the creation part.



@skipUnlessDBFeature("supports_atlas_search")
class SearchEqualsTest(SearchUtilsMixin):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to be consistent in this project about using "Tests" (plural) in the class names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 mmh I didn't notice that. will change.

Comment on lines 112 to 116
boost_score = SearchScoreOption({"boost": {"value": 3}})

qs = Article.objects.annotate(
score=SearchEquals(path="headline", value="cross", score=boost_score)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd inline boost_score, or at least omit the blank line. (Only some tests are inconsistent.)

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things look great, but I've gone through about half of the code (due to size). I will check the test code tomorrow!

Comment on lines 253 to 264
if not has_search:
raise ValueError(
"Cannot combine two `$vectorSearch` operator. "
"If you need to combine them, consider restructuring your query logic or "
"running them as separate queries."
)
raise ValueError(
"Only one $search operation is allowed per query. "
f"Received {len(search_replacements)} search expressions. "
"To combine multiple search expressions, use either a CompoundExpression for "
"fine-grained control or CombinedSearchExpression for simple logical combinations."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two ValueErrors need to be switched.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 the second is the case when:
has_vector_search but it does not has search. I think I should refactor this. It is a bit confusing. the not at the beginning is not helping.

Comment on lines 869 to 871
# Apply De Morgan's Laws.
operator = node.operator.negate() if negated else node.operator
negated = negated != (node.operator == Operator.NOT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is a little confusing because it requires some understanding of negate and the state changes.
I'll leave this as a comment here to be reviewed later.

What's an example of a NOT combinable?
I.e., how would I construct NOT (A AND B) or can this only be done via negate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied De Morgan's Law to get something in the scope of A' operator B'. So:
NOT (A AND B) = Not A or Not B => {SHOULD: [MUST_NOT(A), MUST_NOT(B)] with minimum should in 1.
The other way to handle this is push everything in a must not, but in order to handle: NOT (NOT A)) as A I decided to apply this kind of simplifications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, I almost forgot, we could handle double negation on should. (That are the ors if the minimumShouldMatch is 1 ). long story short
A and B => MUST
not C => MUST_NOT
A or B => SHOULD with minimumShouldMatch is 1
not (A or B) => not A and not B => MUST(MUST_NOT(A), MUST_NOT(B))

When A, B, C are atomic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it a bit. It is simpler now, don't know if it is simple enough 😄

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall PR looks great! I've got some minor corrections, but other than that, it is good to merge from me. Great work! 🚀

It also looks like there's a ReadTheDocs error:

/home/docs/checkouts/readthedocs.org/user_builds/django-mongodb-backend/checkouts/325/docs/source/ref/models/search.rst:654: WARNING: unknown document: 'atlas:atlas-search/scoring/' [ref.doc]

@@ -16,6 +16,12 @@ New features
- Added :class:`~.fields.PolymorphicEmbeddedModelField` and
:class:`~.fields.PolymorphicEmbeddedModelArrayField` for storing a model
instance or list of model instances that may be of more than one model class.
- Added support for MongoDB Atlas Search expressions, including
``SearchAutocomplete``, :class:`.SearchEquals`, ``SearchVector``, and others.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``SearchAutocomplete``, :class:`.SearchEquals`, ``SearchVector``, and others.
``SearchAutocomplete``, :class:`SearchEquals`, ``SearchVector``, and others.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion isn't correct. Without the leading dot, the class won't be resolved properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 maybe I forgot to add the prefix ~.expressions. But I don't know much about docs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works as is. The dot allows the path to be looked up rather than resolved as an exact match.

def create_search_index(cls, model, index_name, definition, type="search"):
collection = cls._get_collection(model)
idx = SearchIndexModel(definition=definition, name=index_name, type=type)
collection.create_search_index(idx)
Copy link
Contributor

@Jibola Jibola Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: For the sake of testing, we can make this a blocking call and check for the index before continuing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 If I don't understand wrong, I have to add a wait for predicate. Right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants